-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
W3SRCE refactor [draft] #1123
base: develop
Are you sure you want to change the base?
W3SRCE refactor [draft] #1123
Conversation
Some code moved around to group together more logically.
Wasted most of a day finding them. :(
NOT COMPLETE! CURRENTLY EXITS AFTER FIRST W3SPR4 call Made change to allow first W3SPR4 call (when IT=0) to run using all arrays: - Updates some local variables with CHUNKSIZE dimension. - Some input parameters to W3SRCE now have extra NSEA[L[M]] dimension - Some chunk element loops (CSEA) added to blocks of code - W3SRCE only called once, rather than in seapoint loop
Chunk dimensions added to VSIN, VDIN and associated variables Explicit loop added around calls to W3SINx routines in integration loop
Changes are currently B4B with develop (ST4 compile)
- variables were being zeroed in chunk loop - claculations where being skipped as SRC_MASK was all true Note: Had to recalculate the source mask after integration loop; I need to find a better way of doing this.
Commented out some unusued vars
…after non-b4b results with non-homeogenous depth). Also updated PHICE.
- Required deprecation of REFLEC and REFLED locals in w3srce - Calculation of `REFLEC(4) * BERG` moved into w3srce - Needs #ifdefs in call to w3srce as REFLC and REFLD are not allocated if W3_REF1 not set - Had to make IX and IY chunk arrays in w3srce
@JessicaMeixner-NOAA and @MatthewMasarik-NOAA Please note, I have NOT run the OMP/OMPH tests as the original OMP acceleration for the source terms has been lost (it was in the W3WAVE loop) and I have no re-implemented it yet in W3SRCE. For me, almost everything is B4B apart from the following:
I think the WW3_TS1 tests will be B4B once issue #1085 is resolved (Mickael has added it to his upcoming ST4 PR). There are a few things left for me to do before this one is fully ready for testing, most importantly sorting out the OMP stuff. I need to do some more performance analysis on the CPU too. I am also aware that @mickaelaccensi's upcoming ST4 PR will likely impact these changes, so some time will be needed to sort out any incoming conflicts anyhow. Getting some early feedback would be good though just to catch anything that my tests might have missed. Thanks! |
@ukmo-ccbunney, thank you, sure thing! I'll get the regression tests going now. |
@ukmo-ccbunney, I just got the test results, so wanted to share some quick feedback. Below I've organized into three groups: the known non-b4b's, then the non-b4b's you found, and lastly the remaining unexpected (
2023-11-13.matrixCompSummary.txt |
That's great - thanks for dong that @MatthewMasarik-NOAA . |
Sure thing, you're welcome @ukmo-ccbunney. That's good to know about the ww3_ts1 results. Can I ask how you tested the ww3_ts1 initialization fix? Separately, I got some non-b4b's running the matrix on #1124 code so I was curious if you pulled that in to test. |
I manually added the |
Okay, I see. thanks for explaining |
…-waves/WW3 into feature/gpu/w3srce_refactor
Pull Request Summary
Major refactor of
W3SRCE
subroutine to calculate source terms for arrays of spectra rather than a single spectrum.Intended to facilitate acceleration on GPU architectures. See presentation in discussion #736 for more details.
Description
The purpose of this change is to address the poor performance of the WW3 source term calculations when running on a GPU. The current implementation of the source term routines processes a single spectrum at a time, which unfortunately does not expose sufficiently large arrays/loops to fully subscribe the large number of parallel threads available on a GPU resulting in poor GPU performance.
The proposed solution to this is to refactor the source term packages to process arrays of spectra, rather than a single spectrum. This provides a large independent seapoint dimension to fully utilise the available GPU threads.
The first step in achieving this is to refactor the
W3SRCE
routine to accept arrays with a seapoint dimension, rather than values for a single seapoint. This moves the NSEA loop one level down fromW3WAVE
intoW3SRCE
. This will allow for the second stage of this refactoring (which will happen at a later date) which will allow individual source term routines to be parallelised over multiple seapoints. However, for this change, source term subroutine calls will be wrapped in a seapoint loop to maintain their current functionality.This first set of changes does not add any GPU acceleration - it is merely paving the way to facilitate GPU acceleration in the next phase. There should be no change to the model outputs and the performance impact on the CPU should be minimal.
There are a few points of consideration to this change:
Summary of important changes:
W3WAVE
now passes fullNSEA(L)
dimensioned arrays to W3SRCE`NSEA(L)
now done inW3SRCE
VAOLD
/VAoldDummy
not used in W3SRCE; removed.VAOLD
used by PDLIB, but not in w3srce.DELX
,DELY
andDELA
not used. Removed.REFLEC
andREFLECD
scalars removed - passed inREFL[CD]
arrays explicitly instead. Calculation now done in w3srceD50
andPSIC
scalars removed. Passing whole arrays.TMP[1234]
scalars removed and passed in arrays direct.VSIO
/VDIO
/SHAVEIO
now optional arguments (only for PDLIB)LSLOC
test onVSIO
/VDIO
now done in w3srce.USTAR
)WHITECAP
TAUICE
TAUBBL
BEDFORMS
Commit Message
Refactor of W3SRCE to allow source term for arrays of seapoints to be calculated rather than a single seapoint.
To facilitate later acceleration of source terms on GPU architecture.
Check list
Testing